-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Don't raise in DataFrame.corr with pd.NA #33809
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
pandas/core/frame.py
Outdated
mat = numeric_df.to_numpy() | ||
if is_object_dtype(mat.dtype): | ||
# We end up with an object array if pd.NA is present | ||
mat[isna(mat)] = np.nan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels like a hack, and it seems it should somehow be possible to cast this whole DataFrame to float using to_numpy but that doesn't seem to work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pass na_value=np.nan
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was trying that originally but it turns out DataFrame.to_numpy doesn't have an na_value argument (seems only Series and EAs do)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm @TomAugspurger
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll want to avoid going through object dtype at all. Can you specify dtype="float"
? Or does that break other things?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That raises an error:
TypeError: float() argument must be a string or a number, not 'NAType'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That matches the behavior of Series.to_numpy(). We need to have the NA value passed through as np.nan
I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think as a workaround astyping to float then calling to_numpy works, then later could use na_value for DataFrame.to_numpy directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you also use .convert_types()
(e.g. like the OP) in the tests
pandas/core/frame.py
Outdated
mat = numeric_df.to_numpy() | ||
if is_object_dtype(mat.dtype): | ||
# We end up with an object array if pd.NA is present | ||
mat[isna(mat)] = np.nan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pass na_value=np.nan
Hmm, I think we wouldn't want that in the test because the functionality we're testing is corr / cov rather than convert_dtypes (the original bug report wasn't minimal, convert_dtypes was working okay)? |
pandas/core/frame.py
Outdated
@@ -7871,16 +7870,16 @@ def corr(self, method="pearson", min_periods=1) -> "DataFrame": | |||
numeric_df = self._get_numeric_data() | |||
cols = numeric_df.columns | |||
idx = cols.copy() | |||
mat = numeric_df.values | |||
mat = numeric_df.astype(float).to_numpy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add copy=False
here
thanks @dsaxton |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff